PyTables: Processing And Analyzing Extremely Large Amounts Of Data In Python
نویسندگان
چکیده
Processing large amounts of data is a must for people working in such fields of scientific applications as Meteorology, Oceanography, Astronomy, Astrophysics, Experimental Physics or Numerical simulation to name only a few. Existing relational or object-oriented databases usually are good solutions for applications in which multiple distributed clients need to access and update a large centrally managed database (e.g., a financial trading system). However, they are not optimally designed for efficient read-only database queries to pieces, or even single attributes, of objects, a requirement for processing data in many scientific fields such as the ones mentioned above. This paper describes PyTables [ 1], a Python library that addresses this need, enabling the end user to manipulate easily scientific data tables and regular homogeneous (such as Numeric [ 2] arrays) Python data objects in a persistent, hierarchical structure. The foundation of the underlying hierarchical data organization is the excellent HDF5 [ 3] C library.
منابع مشابه
Common Spatial Patterns Feature Extraction and Support Vector Machine Classification for Motor Imagery with the SecondBrain
Recently, a large set of electroencephalography (EEG) data is being generated by several high-quality labs worldwide and is free to be used by all researchers in the world. On the other hand, many neuroscience researchers need these data to study different neural disorders for better diagnosis and evaluating the treatment. However, some format adaptation and pre-processing are necessary before ...
متن کاملHardware-accelerated interactive data visualization for neuroscience in Python
Large datasets are becoming more and more common in science, particularly in neuroscience where experimental techniques are rapidly evolving. Obtaining interpretable results from raw data can sometimes be done automatically; however, there are numerous situations where there is a need, at all processing stages, to visualize the data in an interactive way. This enables the scientist to gain intu...
متن کاملApplication of Benford’s Law in Analyzing Geotechnical Data
Benford’s law predicts the frequency of the first digit of numbers met in a wide range of naturally occurring phenomena. In data sets, following Benford’s law, numbers are started with a small leading digit more often than those with a large leading digit. This law can be used as a tool for detecting fraud and abnormally in the number sets and any fabricated number sets. This can be used as an ...
متن کاملFeeding a Large-scale Physics Application to Python
We describe our experiences using Python with the SPaSM molecular dynamics code at Los Alamos National Laboratory. Originally developed as a large monolithic application for massively parallel processing systems, we have used Python to transform our application into a flexible, highly modular, and extremely powerful system for performing simulation, data analysis, and visualization. In addition...
متن کاملInRaDoS: An internal radiation dosimetry computer program
Introduction: Internal radiation dosimetry is important from a radiation protection point of view and can help to optimize the radiation dose delivered to the workers, public, and patients. It has a rather simple protocol but needs a large amount of data. Therefore, it is difficult to do on a routine basis. The use of computer programs makes internal radiation dosimetry simpler...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003